A Novel Algorithm for Mining Rare-Utility Itemsets in a Multi-Database Environment

نویسندگان

  • Guo-Cheng Lan
  • Tzung-Pei Hong
  • Vincent S. Tseng
چکیده

Utility mining has recently been an emerging topic in the field of data mining. It finds out high-utility itemsets by considering both the important factors of profit and quantity. In some situations, rarely occurring items may co-occur in a relatively close relationship with specific high-utility items. These utility itemsets with rare items may provide useful information to decision makers as well. Most of the existing methods on utility mining were designed for a single database (centralized database) and not suitable for the environment with multiple data sources such as those in a chain-store enterprise. Moreover, the existing methods did not consider the existing periods and the existing branches of items. In this paper, we have thus proposed a new kind of patterns, named Rare Utility Itemsets, which consider not only individual profits and quantities but also common existing periods and branches of items in a multi-database environment. We have also proposed a new mining approach called TP-RUI-MD (Two-Phase Algorithm for Mining Rare Utility Itemsets in Multiple Databases) to efficiently discover rare utility itemsets. To our best knowledge, this is the first work on mining rare utility itemsets in a multi-database environment. At last, the proposed approach is shown to have good performance under a variety of system conditions through a series of experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI

Classical frequent itemset mining identifies frequent itemsets in transaction databases using only frequency of item occurrences, without considering utility of items. In many real world situations, utility of itemsets are based upon user’s perspective such as cost, profit or revenue and are of significant importance. Utility mining considers using utility factors in data mining tasks. Utility-...

متن کامل

High Utility Rare Itemset Mining over Transaction Databases

High-Utility Rare Itemset (HURI) mining finds itemsets from a database which have their utility no less than a given minimum utility threshold and have their support less than a given frequency threshold. Identifying high-utility rare itemsets from a database can help in better business decision making by highlighting the rare itemsets which give high profits so that they can be marketed more t...

متن کامل

A Distributed Approach to Extract High Utility Itemsets from XML Data

This paper investigates a new data mining capability that entails mining of High Utility Itemsets (HUI) in a distributed environment. Existing research in data mining deals with only presence or absence of an items and do not consider the semantic measures like weight or cost of the items. Thus, HUI mining algorithm has evolved. HUI mining is the one kind of utility mining concept, aims to iden...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009